Multi-document Summarization using Tensor Decomposition
نویسندگان
چکیده
The problem of extractive text summarization for a collection of documents is defined as selecting a small subset of sentences so the contents and meaning of the original document set are preserved in the best possible way. In this paper we present a new model for the problem of extractive summarization, where we strive to obtain a summary that preserves the information coverage as much as possible, when compared to the original document set. We construct a new tensor-based representation that describes the given document set in terms of its topics. We then rank topics via Tensor Decomposition, and compile a summary from the sen tences of the highest ranked topics.
منابع مشابه
Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In additio...
متن کاملA Multi-Document Multi-Lingual Automatic Summarization System
Abstract. In this paper, a new multidocument multi-lingual text summarization technique, based on singular value decomposition and hierarchical clustering, is proposed. The proposed approach relies on only two resources for any language: a word segmentation system and a dictionary of words along with their document frequencies. The summarizer initially takes a collection of related documents, a...
متن کاملDimensionality Reduction Aids Term Co-Occurrence Based Multi-Document Summarization
A key task in an extraction system for query-oriented multi-document summarisation, necessary for computing relevance and redundancy, is modelling text semantics. In the Embra system, we use a representation derived from the singular value decomposition of a term co-occurrence matrix. We present methods to show the reliability of performance improvements. We find that Embra performs better with...
متن کاملClustered Sub-Matrix Singular Value Decomposition
This paper presents an alternative algorithm based on the singular value decomposition (SVD) that creates vector representation for linguistic units with reduced dimensionality. The work was motivated by an application aimed to represent text segments for further processing in a multi-document summarization system. The algorithm tries to compensate for SVD’s bias towards dominant-topic document...
متن کاملMulti-Document Summarization Using A* Search and Discriminative Learning
In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computación y Sistemas
دوره 18 شماره
صفحات -
تاریخ انتشار 2014